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Abstract 

Background: Genomic selection estimates genetic merit based on dense SNP (single nucleotide polymorphism) 
genotypes and phenotypes. This requires that SNPs explain a large fraction of the genetic variance. The objectives 
of this work were: (1) to estimate the fraction of genetic variance explained by dense genome-wide markers using 
54 K SNP chip genotyping, and (2) to evaluate the effect of alternative marker-based relationship matrices and 
corrections for the base population on the fraction of the genetic variance explained by markers. 

Methods: Two alternative marker-based relationship matrices were estimated using 35 706 SNPs on 1086 dairy 
bulls. Both pedigree- and marker-based relationship matrices were fitted simultaneously or separately in an animal 
model to estimate the fraction of variance not explained by the markers, i.e. the fraction explained by the pedigree. 
The phenotypes considered in the analysis were the deregressed estimated breeding values (dEBV) for milk, fat and 
protein yield and for somatic cell score (SCS). 

Results: When dEBV were not sufficiently accurate (50 or 70%), the estimated fraction of the genetic variance 
explained by the markers was around 65% for yield traits and 45% for SCS. Scaling marker genotypes with 
locus-specific frequencies of heterozygotes slightly increased the variance explained by markers, compared with 
scaling with the average frequency of heterozygotes across loci. The estimated fraction of the genetic variance 
explained by the markers using separately both relationships matrices followed the same trends but the results 
were underestimated. With less accurate dEBV estimates, the fraction of the genetic variance explained by markers 
was underestimated, which is probably an artifact due to the dEBV being estimated by a pedigree-based animal 
model. 

Conclusions: When using only highly accurate dEBV, the proportion of the genetic variance explained by the 
lllumina 54 K SNP chip was approximately 80% for Brown Swiss cattle. These results depend on the SNP chip used 
and the family structure of the population, i.e. more dense SNPs and closer family relationships are expected to 
result in a higher fraction of the variance explained by the SNPs. 



Background 

Genome-wide dense marker arrays that are available for 
livestock populations cover all chromosomes with dense 
single nucleotide polymorphism (SNP) markers [1]. 
Many dairy cattle populations are currently being geno- 
typed using these arrays [2-4]. The main objective is to 
apply genomic selection (GS) [5]. GS allows prediction 
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of the genetic merit of young animals based on marker 
information in the absence of own performance data. 
The marker effects are estimated in a reference popula- 
tion, which must have both genotypic and phenotypic 
records. In the case of dairy bulls, phenotypic data come 
from genetic evaluations in the form of daughter yield 
deviation (DYD) or deregressed estimated breeding 
values (dEBV) [6]. 

Identity by descent (IBD) alleles refer to alleles that 
descend from a common ancestor in the base popula- 
tion [7]. The coefficient of coancestry between two ani- 
mals is defined as the probability that two randomly 
sampled alleles from the two animals are IBD [8], and 
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twice the coancestry is defined as their numerator rela- 
tionship [8]. This approach leads to the estimation of a 
matrix of relationships based on the pedigree informa- 
tion. The latter is fundamental to estimate the genetic 
parameters for complex traits such as heritability 
(defined as the proportion of the phenotypic variance 
in a population that is attributed to additive genetic 
effects). The relationship matrix based on pedigree 
data dates back to a base population, for which pa- 
rents are unknown and which is considered unrelated, 
unselected and non-inbred. The choice of the base 
population affects the estimate of the additive genetic 
variance [9]. 

However, the relationship matrix can also be esti- 
mated from genome-wide genetic markers such as 
panels of SNPs [10-12]. Methods have been developed 
to construct such marker-based relationship matrices 
[12-15]. Recently, these relationship matrices have been 
used to dissect the additive genetic variance of complex 
traits [16]. 

The proportion of the genetic variance not captured by 
markers (C miss ) represents the variance that cannot be 
used by GS and affects the maximum accuracy that can 
be achieved by GS [17]. The term 'missing heritability' 
[18] describes the fact that marker-phenotype associa- 
tions identified in genome-wide association studies do 
not explain all the genetic variance in complex traits (e.g. 
height in humans). Some strategies have been proposed 
to reduce C miss : (1) increasing the sample size in order to 
also detect genes with smaller effects, (2) expanding the 



studies to non-European samples in human genetics, (3) 
enlarging the collection of phenotypes to explore gene- 
gene interactions, (4) changing the structure of the train- 
ing population, mainly in terms of the relatedness of the 
included individuals, and (5) moving to the genomic se- 
lection approach instead of estimating the marker effect 
for each SNP individually [13,19,20]. In animal breeding, 
some results suggest that the Illumina Bovine54K chip 
array (Illumina Inc., San Diego, CA) does not capture all 
the additive genetic variation for all dairy traits [21-23], 
even when using the GS approach, it estimates simultan- 
eously all the SNP effects. 

The main objective of this study was to estimate the 
fraction of the genetic variance not explained by the 
54 K Illumina SNP chip. Two alternative marker-based 
relationship matrices were used for analysis. 

Methods 

Genotypic and phenotypic data 

A total of 1092 Italian Brown Swiss bulls were genotyped 
with the Illumina Bovine54K chip (Illumina Inc., San 
Diego, CA). These bulls were born between 1963 and 
2002. Figure 1 shows the distribution of the genotyped 
bulls over the birth years. All the SNPs on the X- 
chromosome were excluded from the analysis, which left 
51 582 markers. The quality control process removed 
1421 SNPs that had more than 5% missing genotypes 
and 14 455 SNPs with a minor allele frequency lower 
than 5%. Six sires were deleted because their genotyping 
rate was lower than 95%. Editing was performed with 
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Figure 1 Distribution of birth years of the 1092 genotypes Italian Brown Swiss bulls. 
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two different software packages: SAS* (SAS Inst. Inc., 
Cary, NC) and PUNK vl.07 [24]. At the end of the qual- 
ity control process, genotypes were available for 1086 
sires with 35 706 SNPs and with a missing genotype rate 
of 0.66%. 

The phenotypic data available were the EBV for fat 
yield (FAT), milk yield (MILK), protein yield (PROT) 
and somatic cell score in milk (SCS) for each bull, which 
were calculated by the Italian National Association of 
Brown Swiss (ANARB). The EBV were deregressed as 
proposed by Garrick [21], in order to eliminate the 
shrinkage contained in the EBV and to remove ancestral 
information. The deregressed EBV (dEBV) were used as 
phenotypic records for the bulls with heritability equal 
to the reliability of the EBV. 

Three subsets were formed according to the reliability 
of EBV as follows: animals with a reliability of at least 
50% for each trait; animals with a reliability greater than 
70% for each trait; animals with a reliability of at least 
90% for each trait. 

Relationship matrices: A and G 

A pedigree file was extracted from the Italian Brown 
Swiss herd book. Pedigree was traced back five genera- 
tions and the pedigree file included 6826 entries. The 
completeness in the pedigree was 100% up to the grand- 
parents, and decreased to -90% thereafter. The equiva- 
lent number of known generations as calculated by the 
software Pedig [25] was on average 5.14 and the median 
was 5.23. The pedigree file was used to estimate the 
additive genetic relationships (A) with an adapted ver- 
sion of the procedure proposed by Meuwissen and Luo 
[26], as implemented in ASREML [27]. 

Two genomic relationship matrices (G) were com- 
puted for all genotyped animals. The first Gy was based 
on the method proposed by VanRaden [12]. Let M be 
the marker-genotype matrix with number of individuals 
(n) and number of loci (m) as dimensions. The elements 
in the matrix M were coded as -1 (homozygous for one 
allele) 0 (heterozygous) and 1 for (homozygous for the 
other allele). The nxm matrix P contains columns with 
all elements 2{p r 0.5), where pi is the frequency of the 
second allele at locus i. The matrix P was subtracted 
from M to give Z = M - P. Finally, matrix Gy was calcu- 
lated as: 



ZZ 



The second genomic relationship matrix (G Y ) was 
computed as: 

WW 



where W is the Z matrix but with each element scaled 
based on the allele frequency of each locus as follows: 
Wi j = T _ z * = [12,14]. 



Correction for the base population 

Both the G matrix and the pedigree-based relationship 
matrix, A, are expressed relative to a base population, i. 
e. an original population in which all animals are as- 
sumed unrelated and non-inbred, and these populations 
may differ between the pedigree-based and genomic re- 
lationship matrices [15]. To correct for these differences, 
the scale of G was changed to that of A based on 
Wright's F-statistic [7]. We expressed the total inbreed- 
ing of animal i in the G matrix as: 

Fit = G u -1 or F it = Fst + 0-~F st ) F is , 

where F st is the average inbreeding in the population, i.e. 
the average of the diagonal elements of G minus 1, and 
F is is the inbreeding of animal i relative to the popula- 
tion average inbreeding F st , which is calculated as: 



(GlI-l-Fg) 



F U 

The average population inbreeding of G was set equal 
to that of A by rescaling the diagonal element of G cor- 
responding to individual i as: 

G*j = A st + {l-A st )F st + 1, 

Where A st is the average of the diagonals of A minus 
1. The off-diagonals of G were rescaled similarly, using 
the same F st and A st values. Numerator relationships 
were transformed to kinships, 0, i.e. by dividing the rela- 
tionship by 2, and performing the base-correction on the 
kinship level, which is the same level as that of inbreed- 
ing, i.e. 



0n 



and 



m 



G*i = 2[A st + (l-A st )0 jis ], 

where 0 /IS is the kinship of animal / and i relative to the 
base population inbreeding, F st . 

Estimation of variance components 

To estimate the fraction of the genetic variance captured 
by dense markers covering the entire genome, the ap- 
proach of Goddard et al. [28] was used. Both matrix A 
and G were fitted in the model simultaneously in order 
to estimate the fraction of the genetic variance captured 
by each of these matrices. The variance component ana- 
lyses were performed by ASREML-R [29], using the fol- 
lowing model: 

y = lu + Z x a + Z 2 u + e, 
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where y is the vector of the dEBV; u is the overall mean; 
Zi and Z 2 are the incidence matrices for pedigree-based 
and genomic random animal effects, respectively; a is 
the vector of the random additive genetic animal effects 
using the pedigree-based relationship matrix, with a ~ N 
(0, AOa); u is the vector of random additive genetic effect 
using the genomic relationship matrix, with u ~ N(0, 
Ga„); and finally, e is the vector of random residual ef- 
fects. Because the number of daughters per bull was 
high for all bulls, the reliabilities of the dEBV were high 
and varied little between bulls, and a homogeneous error 
variance structure was assumed. 

If we assume that A is an unbiased estimate of G, and 
write G = A + D [28], where D is a matrix of deviations 
from pedigree relationships due to the segregation of a 
finite number of chromosome segments in the genome, 
the genetic variance of the records becomes V(g) = Go„ 
+ Aal = A(Cu + of) + DOu- Hence, as in a model that fits 
only pedigree relationships (y = lu + Zja + e), the total 
genetic variance is explained by the A matrix and the 
segregation of chromosome segments that are traced by 
the markers is explained by of,. The fraction of genetic 
variance not captured by the markers on the SNP chip 
(C m iss) was thus estimated as: 

2 2 

a a 

C i _ i u 

°J \ a l + a D 

where is the total genetic variance, of is the variance 
due to marker-based relationships and of is the variance 
due to pedigree-based relationships. 

The two additive genetic variances were also estimated 
by fitting each separately: the additive genetic animal 
variance using the pedigree-based relationship matrix 
( of q ) and the additive genetic variance using the gen- 
omic relationship matrix (o^ 0 ). The estimate of of 0 was 
used to calculate an alternative estimate for the fraction 
of genetic variance not addressed by the markers on the 

a 2 

SNP chip (C miss2 ) as follows: C miss2 = 1 - -f . The es- 

timate C, msS 2 has the advantage that ofo is known to 
yield an unbiased estimate of the genetic variance, but it 
has the disadvantage that ofo is likely to include more 
genetic variance than that explained by QTL that are in 
LD with the markers [11]. E.g. if only some of the chro- 
mosomes contain markers, these markers can explain 
genetic variance at the unmarked chromosomes, because 
the markers trace family relationships. If, in the latter 
case, the pedigree-based relationship matrix is fitted simul- 
taneously with the marker-based relationship matrix, the 
variance due to the unmarked chromosomes is expected to 
be included in the polygenic variance, of, because the 
pedigree-based relationship matrix more closely resembles 
the family relationships at the unmarked chromosomes 



than at the marked chromosomes, which may show rela- 
tionships that (randomly) deviate from the pedigree. Thus, 
C m iss2 is expected to underestimate the fraction of missing 
genetic variance. 

Results 

Descriptive statistics 

Descriptive statistics for each trait and dataset are in 
Table 1. In the group of bulls with dEBV reliabilities of at 
least 50%, the dEBV average reliability was -90% (±7%) 
for the production traits (FAT, PROT and MILK), and 
82.6% (±10.7%) for SCS. The subset of sires with dEBV 
reliabilities of at least 70% had a similar average reliability 
of ~91% (±5%) for the production traits. The lowest aver- 
age reliability in this subset was 85.7% (±7.4%) for SCS. 
Finally, the subset of bulls with reliabilities of at least 
90% had an average reliability close to -94% (±3%) for all 
traits. As expected, the differences in the average of the 
reliabilities between traits tended to decrease with in- 
creasing minimum reliability requirements. 

Proportion of genetic variance not explained by markers 

The fraction of genetic variance not explained by mo- 
lecular markers based on C m;ss was estimated for all 
datasets (50, 70 and 90 dEBV reliabilities) and traits 
(FAT, PROT, MILK and SCS). Results are in Table 2. For 
dFAT50, the estimate of C TOISS was 0.373 ± 0.068 based 
on Gy and 0.363 ± 0.069 based on Gy- The estimates of 
C m i ss were smaller for the dFAT70 subset than for the 
dFAT50 subset. For dFAT90, the estimate was 0.305 ± 
0.074 Gy, while the Gy matrix did not result in con- 
verged variance component estimates. Algorithms other 
than the AI-REML algorithm might have converged (e.g. 
the EM-algorithm, which is known to be slow), but the 
convergence difficulties are probably due to the small 
size of the dataset, thus resulting variance component 
estimates would have been unreliable. 

The fraction of the genetic variance not explained by 
molecular markers based on C mlsS 2 through the additive 
genetic variances was estimated separately for all data- 
sets and traits (Table 3). Results for C misS 2 followed the 
same trends as for C mlJS but the values of C„ 3IJS 2 were 
lower probably due to its underestimation of the fraction 
of the missing genetic variance. 

Results for dMILK, dPROT and dSCS were similar to 
those described above for dFAT for both genomic rela- 
tionship matrices. Estimates of C miss for dMILK70 and 
dPROT70 hardly differed from those for dMILK50 and 
dPROT50, respectively. The subsets with dEBV90 re- 
sulted in estimates of C miss of 0.199 (±0.101) for 
dMILK90 and 0.206 (±0.098) for dPROT90 when using 
Gy. These estimates were not significantly different from 
those obtained with the larger datasets for the same 
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Table 1 Descriptive statistics for de-regressed estimated breeding values (dEBV) and reliabilities (r 2 ) for 
production traits* 



Trait 


Subset 
label 


Number of 
observations 


dEBV 
Mean 


SO 


r 2 (%) 
Mean 


SD 


Fat yield 


dFAT50 


1034 


-8.1 


26.3 


90.2 


7.4 




dFAT70 


1006 


-8.7 


26.1 


91.0 


5.8 




dFAT90 


655 


-12.7 


25.4 


94.3 


2.9 


Milk yield 


dMILK50 


1034 


-205.9 


666.9 


90.7 


7.3 




dMILK70 


1014 


-214.7 


665.6 


91.4 


5.6 




dMILK90 


691 


-316.1 


646.9 


94.4 


2.9 


Protein yield 


dPROT50 


1034 


-8.2 


23.3 


90.6 


7.1 




dPROT70 


1009 


-8.7 


23.2 


91.3 


5.6 




dPROT90 


681 


-12.1 


22.9 


94.4 


2.9 


Somatic cell score 


dSCS50 


978 


0.246 


1.206 


82.6 


10.7 




dSCS70 


848 


0.233 


1.118 


85.7 


7.4 




dSCS90 


223 


0.018 


0.972 


95.2 


2.9 


"Subsets of the genotyped 


sire population were 


divided based on minimum 


reliabilities (50, 70, or 90); SD: standard deviation. 







traits (dEBV50 or dEBV70), although they were system- 
atically lower for all traits. 

The highest estimates for C m i SS were obtained for 
dSCS50, with 0.532 (±0.091) for G v - When using G Y , the 
corresponding C mlss estimate was lower (0.486 ± 0.095). 
The smallest C m , ss estimate was obtained for dSCS90: 
0.061 (±0.197) using G Y - The variance component ana- 
lysis with Gy on the same dataset did not converge. This 
was the smallest dataset and, although the average reli- 
ability was the highest, estimates of C miss were not sig- 
nificantly different from 0. 

Table 2 Proportion of genetic variance not explained by 
markers (C m/ss ) ± standard error (SE) for dEBV for 
production traits* 1 



Label 


G Y 


G v 


dFAT50 


0.363 ± 0.069 


0.373 ± 0.068 


dFAT70 


0.363 ± 0.072 


0.369 ± 0.070 


dFAT90 


NC 


0.305 ± 0.074 


dMILK50 


0.337 ±0.076 


0.357 ±0.074 


dMILK70 


0.342 ± 0.077 


0.358 ±0.075 


dMILK90 


0.1 99 ±0.1 01 


0.245 ± 0.098 


dPROT50 


0.345 ± 0.077 


0.363 ±0.074 


dPROT70 


0.344 ± 0.078 


0.357 ±0.076 


dPROT90 


0.206 ± 0.098 


0.235 ± 0.095 


dSCS50 


0.486 ± 0.095 


0.532 ±0.091 


dSCS70 


0.492 ±0.1 01 


0.530 ±0.097 


dSCS90 


0.061 ±0.197 


NC 



•Subsets of the genotyped sire population were divided based on minimum 
reliabilities (50, 70, or 90); NC: Log-likelihood was not available since the 
iterative procedure was not convergent; 1 G Y , and G v : genomic relationship 
matrices as proposed by [14] and [12], respectively, and corrected to the same 
base population. 



In general, estimates of C miss2 decreased as the reliabil- 
ity of the dEBV increased. Estimates of C m;sS 2 differed 
from estimates of C mlM , probably because C m i SS 2 is ex- 
pected to underestimate the fraction of the missing gen- 
etic variance. 

Discussion 

We estimated the fraction of the genetic variance not 
accounted by SNPs in the marker panel (C mlss ) based on 
the Illumina 54 K SNP chip for complex traits in dairy 
cattle. The results showed that the estimates of C mlM 
depended on the reliability of the phenotypic traits con- 
sidered, i.e. the dEBV used as response values. When the 
accuracy of the dEBV increases, i.e. when the correlation 

Table 3 Proportion of genetic variance not explained by 
markers [C miss2 ) for dEBV for production traits* 1 



Label 


Gy 


G v 


dFAT50 


0.116 


0.097 


dFAT70 


0.108 


0.089 


dFAT90 


0.026 


0.024 


dMILK50 


0.073 


0.055 


dMILK70 


0.075 


0.057 


dMILK90 


0.125 


0.101 


dPROT50 


0.054 


0.035 


dPROT70 


0.052 


0.031 


dPROT90 


0.031 


0.008 


dSCS50 


0.149 


0.149 


dSCS70 


0.152 


0.152 


dSCS90 


-0.024 


-0.024 



^Subsets of the genotyped sire population were divided based on minimum 
reliabilities {50, 70, or 90); ''Gy and G v : genomic relationship matrices as 
proposed by [14] and [12], respectively. 
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between dEBV and the true breeding value increases, the 
proportion of the genetic variance explained by SNPs 
tended to increase. When the reliability of the dEBV is 
low, the family/pedigree information greatly contributes 
to the estimation of the EBV, which results in a larger 
fraction of the variance being explained by A and, in 
turn, in upward biases of C m i ss . Because the estimates of 
the C m i SS values, are expected to be overestimated due to 
the use of (family information in) dEBV, the best esti- 
mates of C m iss ar e obtained for data sets with high reliabil- 
ities, which resulted in estimates around 0.2. This implies 
that the maximum accuracy of GEBV is V(l-C m£ss ) ~ 0.9, 
which agrees with the result of Daetwyler [22], who stud- 
ied the increase in the accuracy of GEBV with increasing 
training population sizes. 

For all production traits, the fraction of the genetic 
variance not explained by the SNPs was significantly dif- 
ferent from 0, even when the phenotypes were very ac- 
curate (reliability > 90%), and were, therefore, very close 
to the true breeding values. Correction for the base 
population did not affect the fraction of the genetic vari- 
ance explained by markers for any of the marker-based 
relationships here used. The differences in C m i ss estimates 
between using G v and G Y were negligible for all traits and 
all subsets. Similarly, when using EBV instead of dEBV 
(results not shown), the results were virtually the same. 

If original performance records of production and SCS 
phenotypes are used to estimates C miss , instead of dEBV, 
the upward biases mentioned above are not expected to 
occur. The error variances would be higher than when 
using dEBV, but the value of oi would not be inflated, 
because family information does not contribute to own 
phenotype (in contrast to dEBV phenotypes). 

The sources of phenotypic information used in gen- 
omic analyses are very heterogeneous and vary from in- 
dividuals with highly reliable information, i.e. progeny- 
tested bulls, and animals with phenotypes with low 
levels of accuracy, i.e. young cows. To take into account 
these differences in reliability in a weighted analysis, it is 
necessary to know the value of C m i ss for each phenotype 
[22]. In addition, a polygenic effect must be included in 
the model to account for unmarked genetic effects. 
Knowledge of the fraction of the genetic variance not ex- 
plained by markers is also required to predict the accur- 
acy of the genomic predictions for each individual in the 
population, since it affects the maximum accuracy that 
can be achieved [17]. 

The base population correction of the genomic rela- 
tionship matrix generally affected neither the proportion 
of genetic variance captured by markers, nor the genetic 
variance captured by the pedigree-based relationship 
matrices, which agrees with [17,30] but not with [31]. 
The latter authors, however, scaled the relationships in 
the opposite direction, i.e. when G relationships were too 



high, they scaled all relationships downwards, which fur- 
ther decreased the differences in relationships that were 
already small since relationships are bound by a max- 
imum of 1 (and vice-versa when G relationships were too 
small). Moreover, the correction for the base population 
facilitates the integration of relationship matrices A and 
G into a single matrix (H), according to Legarra et al. 
[32], Christensen and Lund [13], and Meuwissen et al. 
[15]. 

We also estimated C mKS 2 using the pedigree-based es- 
timate of genetic variance. The denominators of C mlM 
and C miss2 were significantly different from each other 
but both estimates revealed that the genomic relation- 
ship matrix could explain more than 95% of genetic vari- 
ance if sufficiently reliable phenotypes are used (with 
reliabilities greater than 95%). 

It should be noted that the estimates of C m;ss and 
Cmiss2 depend on the SNP chip used, i.e. more dense 
SNP chips are expected to yield lower estimates of C mlM 
and C m i SS 2 (a larger fraction of the variance is explained 
by the SNPs), and also on the family structure of the 
population [33]. Populations with more closely related 
individuals are expected to yield high LD between SNPs 
and QTL, even when they are physically quite far apart 
and, therefore, lower estimates of C mlss . The population 
structure of the Italian Brown Swiss population reflects 
that of a typical dairy breeding population, and, thus, 
our results probably apply also to other dairy breeding 
populations. 

Conclusions 

The fraction of genetic variance explained by genetic 
markers from high-density SNP panels was significantly 
different from 0 for the complex traits analyzed when 
the phenotypes are not highly accurate. The minimum 
fraction of the genetic variance not explained by the 
markers (C miss ) was equal to 0.2, which was estimated 
based on the most accurate phenotypes. This value 
agrees with other values reported in the literature. Cor- 
rection of the genomic relationship matrix for the vari- 
ance of the allele frequency of each locus (G Y ) instead of 
the average frequency of heterozygotes (G v )i hardly ex- 
plained any additional genetic variance. Our estimate of 
C m i ss °f 0-2 implies that about 80% of the genetic vari- 
ance is explained by the Illumina 54 K SNP chip. Values 
for C miss are expected to depend on the density of the 
chip (a larger SNP chip is expected to explain a larger 
fraction of the genetic variance) and on family relation- 
ships in the population, i.e. closer family relationships 
are expected to reduce C miss . 
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